Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 27
Filtrar
1.
Sci Rep ; 14(1): 5956, 2024 03 12.
Artigo em Inglês | MEDLINE | ID: mdl-38472298

RESUMO

Extensive research has been conducted on poverty in developing countries using conventional regression analysis, which has limited prediction capability. This study aims to address this gap by applying advanced machine learning (ML) methods to predict poverty in Somalia. Utilizing data from the first-ever 2020 Somalia Demographic and Health Survey (SDHS), a cross-sectional study design is considered. ML methods, including random forest (RF), decision tree (DT), support vector machine (SVM), and logistic regression, are tested and applied using R software version 4.1.2, while conventional methods are analyzed using STATA version 17. Evaluation metrics, such as confusion matrix, accuracy, precision, sensitivity, specificity, recall, F1 score, and area under the receiver operating characteristic (AUROC), are employed to assess the performance of predictive models. The prevalence of poverty in Somalia is notable, with approximately seven out of ten Somalis living in poverty, making it one of the highest rates in the region. Among nomadic pastoralists, agro-pastoralists, and internally displaced persons (IDPs), the poverty average stands at 69%, while urban areas have a lower poverty rate of 60%. The accuracy of prediction ranged between 67.21% and 98.36% for the advanced ML methods, with the RF model demonstrating the best performance. The results reveal geographical region, household size, respondent age group, husband employment status, age of household head, and place of residence as the top six predictors of poverty in Somalia. The findings highlight the potential of ML methods to predict poverty and uncover hidden information that traditional statistical methods cannot detect, with the RF model identified as the best classifier for predicting poverty in Somalia.


Assuntos
Benchmarking , Aprendizado de Máquina , Estudos Transversais , Somália , Pobreza
2.
Math Biosci Eng ; 20(11): 19871-19911, 2023 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-38052628

RESUMO

Recent innovations have focused on the creation of new families that extend well-known distributions while providing a huge amount of practical flexibility for data modeling. Weighted distributions offer an effective approach for addressing model building and data interpretation problems. The main objective of this work is to provide a novel family based on a weighted generator called the length-biased truncated Lomax-generated (LBTLo-G) family. Discussions are held about the characteristics of the LBTLo-G family, including expressions for the probability density function, moments, and incomplete moments. In addition, different measures of uncertainty are determined. We provide four new sub-distributions and investigated their functionalities. Subsequently, a statistical analysis is given. The LBTLo-G family's parameter estimation is carried out using the maximum likelihood technique on the basis of full and censored samples. Simulation research is conducted to determine the parameters of the LBTLo Weibull (LBTLoW) distribution. Four genuine data sets are considered to illustrate the fitting behavior of the LBTLoW distribution. In each case, the application outcomes demonstrate that the LBTLoW distribution can, in fact, fit the data more accurately than other rival distributions.

3.
Immun Inflamm Dis ; 11(8): e981, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37647450

RESUMO

BACKGROUND: Accessibility to the immense collection of studies on noncommunicable diseases related to coronavirus disease of 2019 (COVID-19) and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is an immediate focus of researchers. However, there is a scarcity of information about chronic obstructed pulmonary disease (COPD), which is associated with a high rate of infection in COVID-19 patients. Moreover, by combining the effects of the SARS-CoV-2 on COPD patients, we may be able to overcome formidable obstacles factors, and diagnosis influencers. MATERIALS AND METHODS: A retrospective study of 280 patients was conducted at DHQ Hospital Muzaffargarh in Punjab, Pakistan. Negative binomial regression describes the risk of fixed successive variables. The association is described by the Cox proportional hazard model and the model coefficient is determined through log-likelihood observation. Patients with COPD had their survival and mortality plotted on Kaplan-Meier curves. RESULTS: The increased risk of death in COPD patients was due to the effects of variables such as cough, lower respiratory tract infection (LRTI), tuberculosis (TB), and body-aches being 1.369, 0.693, 0.170, and 0.217 times higher at (95% confidence interval [CI]: 0.747-1.992), (95% CI: 0.231-1.156), (95% CI: 0.008-0.332), and (95% CI: -0.07 to 0.440) while it decreased 0.396 in normal condition. CONCLUSION: We found that the symptoms of COPD (cough, LRTI, TB, and bodyaches) are statistically significant in patients who were most infected by SARS-CoV-2.


Assuntos
COVID-19 , Doença Pulmonar Obstrutiva Crônica , Infecções Respiratórias , Humanos , COVID-19/epidemiologia , SARS-CoV-2 , Estudos Retrospectivos , Tosse , Paquistão/epidemiologia , Fatores de Risco , Doença Pulmonar Obstrutiva Crônica/epidemiologia
4.
Biology (Basel) ; 12(7)2023 Jul 04.
Artigo em Inglês | MEDLINE | ID: mdl-37508389

RESUMO

Predictive models based on empirical similarity are instrumental in biology and data science, where the premise is to measure the likeness of one observation with others in the same dataset. Biological datasets often encompass data that can be categorized. When using empirical similarity-based predictive models, two strategies for handling categorical covariates exist. The first strategy retains categorical covariates in their original form, applying distance measures and allocating weights to each covariate. In contrast, the second strategy creates binary variables, representing each variable level independently, and computes similarity measures solely through the Euclidean distance. This study performs a sensitivity analysis of these two strategies using computational simulations, and applies the results to a biological context. We use a linear regression model as a reference point, and consider two methods for estimating the model parameters, alongside exponential and fractional inverse similarity functions. The sensitivity is evaluated by determining the coefficient of variation of the parameter estimators across the three models as a measure of relative variability. Our results suggest that the first strategy excels over the second one in effectively dealing with categorical variables, and offers greater parsimony due to the use of fewer parameters.

5.
J Appl Stat ; 50(1): 131-154, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36530782

RESUMO

This article introduces a new distribution with two tuning parameters specified on the unit interval. It follows from a 'hyperbolic secant transformation' of a random variable following the Weibull distribution. The lack of research on the prospect of hyperbolic transformations providing flexible distributions over the unit interval is a motivation for the study. The main distributional structural properties of the new distribution are established. The different estimation methods and two simulation works have been derived for model parameters. Subsequently, we develop a related quantile regression model for further statistical perspectives. We consider two real data applications based on the educational measurements of both OECD and some non-members of OECD countries. Our regression model aims to relate the desire to get top grades on certain young students in the OECD countries with some of their Education and School Life Index such as reading performance, work environment at home, and paid work experience. It is shown that the elaborated quantile regression model has a better fitting power than famous regression models when the unit response variable possesses skewed distribution as well as two independent variables are significant in the statistical sense at any standard significance level for the median response.

6.
Sensors (Basel) ; 22(10)2022 May 14.
Artigo em Inglês | MEDLINE | ID: mdl-35632152

RESUMO

In this paper, we propose a new privatization mechanism based on a naive theory of a perturbation on a probability using wavelets, such as a noise perturbs the signal of a digital image sensor. Wavelets are employed to extract information from a wide range of types of data, including audio signals and images often related to sensors, as unstructured data. Specifically, the cumulative wavelet integral function is defined to build the perturbation on a probability with the help of this function. We show that an arbitrary distribution function additively perturbed is still a distribution function, which can be seen as a privatized distribution, with the privatization mechanism being a wavelet function. Thus, we offer a mathematical method for choosing a suitable probability distribution for data by starting from some guessed initial distribution. Examples of the proposed method are discussed. Computational experiments were carried out using a database-sensor and two related algorithms. Several knowledge areas can benefit from the new approach proposed in this investigation. The areas of artificial intelligence, machine learning, and deep learning constantly need techniques for data fitting, whose areas are closely related to sensors. Therefore, we believe that the proposed privatization mechanism is an important contribution to increasing the spectrum of existing techniques.


Assuntos
Inteligência Artificial , Privatização , Algoritmos , Aprendizado de Máquina , Probabilidade
7.
Artigo em Inglês | MEDLINE | ID: mdl-34886312

RESUMO

Criticism of the implementation of existing risk prediction models (RPMs) for cardiovascular diseases (CVDs) in new populations motivates researchers to develop regional models. The predominant usage of laboratory features in these RPMs is also causing reproducibility issues in low-middle-income countries (LMICs). Further, conventional logistic regression analysis (LRA) does not consider non-linear associations and interaction terms in developing these RPMs, which might oversimplify the phenomenon. This study aims to develop alternative machine learning (ML)-based RPMs that may perform better at predicting CVD status using nonlaboratory features in comparison to conventional RPMs. The data was based on a case-control study conducted at the Punjab Institute of Cardiology, Pakistan. Data from 460 subjects, aged between 30 and 76 years, with (1:1) gender-based matching, was collected. We tested various ML models to identify the best model/models considering LRA as a baseline RPM. An artificial neural network and a linear support vector machine outperformed the conventional RPM in the majority of performance matrices. The predictive accuracies of the best performed ML-based RPMs were between 80.86 and 81.09% and were found to be higher than 79.56% for the baseline RPM. The discriminating capabilities of the ML-based RPMs were also comparable to baseline RPMs. Further, ML-based RPMs identified substantially different orders of features as compared to baseline RPM. This study concludes that nonlaboratory feature-based RPMs can be a good choice for early risk assessment of CVDs in LMICs. ML-based RPMs can identify better order of features as compared to the conventional approach, which subsequently provided models with improved prognostic capabilities.


Assuntos
Doenças Cardiovasculares , Adulto , Idoso , Doenças Cardiovasculares/epidemiologia , Estudos de Casos e Controles , Humanos , Aprendizado de Máquina , Pessoa de Meia-Idade , Reprodutibilidade dos Testes , Medição de Risco
8.
Entropy (Basel) ; 23(11)2021 Oct 24.
Artigo em Inglês | MEDLINE | ID: mdl-34828091

RESUMO

In this article, we propose the exponentiated sine-generated family of distributions. Some important properties are demonstrated, such as the series representation of the probability density function, quantile function, moments, stress-strength reliability, and Rényi entropy. A particular member, called the exponentiated sine Weibull distribution, is highlighted; we analyze its skewness and kurtosis, moments, quantile function, residual mean and reversed mean residual life functions, order statistics, and extreme value distributions. Maximum likelihood estimation and Bayes estimation under the square error loss function are considered. Simulation studies are used to assess the techniques, and their performance gives satisfactory results as discussed by the mean square error, confidence intervals, and coverage probabilities of the estimates. The stress-strength reliability parameter of the exponentiated sine Weibull model is derived and estimated by the maximum likelihood estimation method. Also, nonparametric bootstrap techniques are used to approximate the confidence interval of the reliability parameter. A simulation is conducted to examine the mean square error, standard deviations, confidence intervals, and coverage probabilities of the reliability parameter. Finally, three real applications of the exponentiated sine Weibull model are provided. One of them considers stress-strength data.

9.
Children (Basel) ; 8(11)2021 Nov 04.
Artigo em Inglês | MEDLINE | ID: mdl-34828722

RESUMO

Malnutrition among children is an important public health problem in Pakistan. Conventional indicators (stunting, wasting and underweight) are well known. However, there is a need for aggregate indicators in this perspective. The goal of this study is to assess the prevalence and trends of malnutrition among Pakistani children under the age of five using the so-called composite index of anthropometric failure (CIAF), a tool for calculating the whole aggregate burden of malnutrition. The data were extracted from the Pakistan Demographic and Health Survey 2012-2013. Mothers' education and socioeconomic statuses (SES) were assessed as important factors in malnutrition. Chi-squared analysis was used to check the bivariate association, and multiple logistic regression was used to identify the significant correlates of child malnutrition. Moreover, multiple correspondence analysis (MCA) was applied to strengthen the use of CIAF as an outcome variable. The study looked at 3071 children under the age of five, with 52.2% of them falling into the CIAF. Children of educated mothers had 43% fewer odds of being malnourished (OR (Odd Ratio) = 0.57, 95% CI (Confidence Interval) = 0.44-0.73). Additionally, a decreasing trend in malnutrition was found with increasing SES. There is a need to improve maternal education. Such programs focusing on increasing women's autonomy in making home decisions should be established. Furthermore, long-term interventions for improving home SES and effective nutritional methods should be examined. For policymakers, the use of CIAF is suggested since it provides an estimate of the entire burden of undernutrition.

10.
Entropy (Basel) ; 23(8)2021 Aug 21.
Artigo em Inglês | MEDLINE | ID: mdl-34441228

RESUMO

In this article, the "truncated-composed" scheme was applied to the Burr X distribution to motivate a new family of univariate continuous-type distributions, called the truncated Burr X generated family. It is mathematically simple and provides more modeling freedom for any parental distribution. Additional functionality is conferred on the probability density and hazard rate functions, improving their peak, asymmetry, tail, and flatness levels. These characteristics are represented analytically and graphically with three special distributions of the family derived from the exponential, Rayleigh, and Lindley distributions. Subsequently, we conducted asymptotic, first-order stochastic dominance, series expansion, Tsallis entropy, and moment studies. Useful risk measures were also investigated. The remainder of the study was devoted to the statistical use of the associated models. In particular, we developed an adapted maximum likelihood methodology aiming to efficiently estimate the model parameters. The special distribution extending the exponential distribution was applied as a statistical model to fit two sets of actuarial and financial data. It performed better than a wide variety of selected competing non-nested models. Numerical applications for risk measures are also given.

11.
Artigo em Inglês | MEDLINE | ID: mdl-34360135

RESUMO

Diet management or caloric restriction for diabetes mellitus patients is essential in order to reduce the disease's burden. Mathematical programming problems can help in this regard; they have a central role in optimal diet management and in the nutritional balance of food recipes. The present study employed linear optimization models such as linear, pre-emptive, and non-pre-emptive goal programming problems (LPP, PGP and NPGP) to minimize the deviations of over and under achievements of specific nutrients for optimal selection of food menus with various energy (calories) levels. Sixty-two food recipes are considered, all selected because of being commonly available for the Indian population and developed dietary intake for meal planning through optimization models. The results suggest that a variety of Indian food recipes with low glycemic values can be chosen to assist the varying glucose levels (>200 mg/dL) of Indian diabetes patients.


Assuntos
Diabetes Mellitus , Planejamento de Cardápio , Diabetes Mellitus/prevenção & controle , Dieta , Ingestão de Energia , Objetivos , Humanos
12.
An Acad Bras Cienc ; 93(2): e20181019, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34190839

RESUMO

In this paper, we introduce a new family of distributions whose probability density function is defined as a weighted sum of two probability density functions; one is defined as a warped version of the other. We focus our attention on a special case based on the exponential distribution with three parameters, a dilation transformation and a weight with polynomial decay, leading to a new life-time distribution. The explicit expressions of the moments generating function, moments and quantile function of the proposed distribution are provided. For estimating the parameters, the method of maximum likelihood estimation is used. Two applications with practical data sets are given.


Assuntos
Algoritmos , Modelos Estatísticos , Funções Verossimilhança , Distribuições Estatísticas
13.
PLoS One ; 16(5): e0250790, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33974643

RESUMO

In recent years, the trigonometric families of continuous distributions have found a place of choice in the theory and practice of statistics, with the Sin-G family as leader. In this paper, we provide some contributions to the subject by introducing a flexible extension of the Sin-G family, called the transformed Sin-G family. It is constructed from a new polynomial-trigonometric function presenting a desirable "versatile concave/convex" property, among others. The modelling possibilities of the former Sin-G family are thus multiplied. This potential is also highlighted by a complete theoretical work, showing stochastic ordering results, studying the analytical properties of the main functions, deriving several kinds of moments, and discussing the reliability parameter as well. Then, the applied side of the proposed family is investigated, with numerical results and applications on the related models. In particular, the estimation of the unknown model parameters is performed through the use of the maximum likelihood method. Then, two real life data sets are analyzed by a new extended Weibull model derived to the considered trigonometric mechanism. We show that it performs the best among seven comparable models, illustrating the importance of the findings.


Assuntos
Estatística como Assunto , Modelos Estatísticos
14.
Int J Sports Physiol Perform ; 16(11): 1692-1699, 2021 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-33975279

RESUMO

PURPOSE: To measure core temperature (Tcore) in open-water (OW) swimmers during a 25-km competition and identify the predictors of Tcore drop and hypothermia-related dropouts. METHODS: Twenty-four national- and international-level OW swimmers participated in the study. Participants completed a personal questionnaire and a body fat/muscle mass assessment before the race. The average speed was calculated on each lap over a 2500-m course. Tcore was continuously recorded via an ingestible temperature sensor (e-Celsius, BodyCap). Hypothermia-related dropouts (H group) were compared with finishers (nH group). RESULTS: Average prerace Tcore was 37.5°C (0.3°C) (N = 21). 7 participants dropped out due to hypothermia (H, n = 7) with a mean Tcore at dropout of 35.3°C (1.5°C). Multiple logistic regression analysis found that body fat percentage and initial Tcore were associated with hypothermia (G2 = 17.26, P < .001). Early Tcore drop ≤37.1°C at 2500 m was associated with a greater rate of hypothermia-related dropouts (71.4% vs 14.3%, P = .017). Multiple linear regression found that body fat percentage and previous participation were associated with Tcore drop (F = 4.95, P = .019). There was a positive correlation between the decrease in speed and Tcore drop (r = .462, P < .001). CONCLUSIONS: During an OW 25-km competition at 20°C to 21°C, lower initial Tcore and lower body fat, as well as premature Tcore drop, were associated with an increased risk of hypothermia-related dropout. Lower body fat and no previous participation, as well as decrease in swimming speed, were associated with Tcore drop.


Assuntos
Hipotermia , Temperatura Corporal/fisiologia , Humanos , Hipotermia/etiologia , Fatores de Risco , Natação/fisiologia , Água
15.
PLoS One ; 16(3): e0249027, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33784310

RESUMO

The estimation of the entropy of a random system or process is of interest in many scientific applications. The aim of this article is the analysis of the entropy of the famous Kumaraswamy distribution, an aspect which has not been the subject of particular attention previously as surprising as it may seem. With this in mind, six different entropy measures are considered and expressed analytically via the beta function. A numerical study is performed to discuss the behavior of these measures. Subsequently, we investigate their estimation through a semi-parametric approach combining the obtained expressions and the maximum likelihood estimation approach. Maximum likelihood estimates for the considered entropy measures are thus derived. The convergence properties of these estimates are proved through a simulated data, showing their numerical efficiency. Concrete applications to two real data sets are provided.


Assuntos
Entropia , Estatística como Assunto , Simulação por Computador , Inundações , Sedimentos Geológicos/química , Funções Verossimilhança , Análise Numérica Assistida por Computador
16.
J Appl Stat ; 48(1): 124-137, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-35707233

RESUMO

In this paper, a new two-parameter discrete distribution is introduced. It belongs to the family of the weighted geometric distribution (GD), with the feature of using a particular trigonometric weight. This configuration adds an oscillating property to the former GD which can be helpful in analyzing the data with over-dispersion, as developed in this study. First, we present the basic statistical properties of the new distribution, including the cumulative distribution function, hazard rate function and moment generating function. Estimation of the related model parameters is investigated using the maximum likelihood method. A simulation study is performed to illustrate the convergence of the estimators. Applications to two practical datasets are given to show that the new model performs at least as well as some competitors.

17.
J Appl Stat ; 48(16): 3002-3024, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-35707257

RESUMO

In this paper, we develop a new general class of skew distributions with flexibility properties on the tails. Moreover, such class can provide heavy and light tails. Some of its mathematical properties are studied, including the quantile function, the moments, the moment generating function and the mean of deviations. New skew distributions are derived and used to construct new models capturing asymmetry inherent to data. The estimation of the class parameters is investigated by the method of maximum likelihood and the performance of the estimators is assessed by a simulation study. Applications of the proposed distribution are explored for two climate data sets. The first data set concerns the annual heat wave index and the second data set involves temperature and precipitation measures from the meteorological station located at Schiphol, Netherlands. Data fitting results show that our models perform better than the competitors.

18.
Chaos ; 30(11): 113142, 2020 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-33261340

RESUMO

The purpose of this study is to discriminate sunflower seeds with the help of a dataset having spectral and textural features. The production of crop based on seed purity and quality other hand sunflower seed used for oil content worldwide. In this regard, the foundation of a dataset categorizes sunflower seed varieties (Syngenta CG, HS360, S278, HS30, Armani, and High Sun 33), which were acquired from the agricultural farms of The Islamia University of Bahawalpur, Pakistan, into six classes. For preprocessing, a new region-oriented seed-based segmentation was deployed for the automatic selection of regions and extraction of 53 multi-features from each region, while 11 optimized fused multi-features were selected using the chi-square feature selection technique. For discrimination, four supervised classifiers, namely, deep learning J4, support vector machine, random committee, and Bayes net, were employed to optimize the multi-feature dataset. We observe very promising accuracies of 98.2%, 97.5%, 96.6%, and 94.8%, respectively, when the size of a region is (180 × 180).


Assuntos
Helianthus , Teorema de Bayes , Humanos , Máquina de Vetores de Suporte
19.
Entropy (Basel) ; 22(3)2020 Mar 17.
Artigo em Inglês | MEDLINE | ID: mdl-33286120

RESUMO

As a matter of fact, the statistical literature lacks of general family of distributions based on the truncated Cauchy distribution. In this paper, such a family is proposed, called the truncated Cauchy power-G family. It stands out for the originality of the involved functions, its overall simplicity and its desirable properties for modelling purposes. In particular, (i) only one parameter is added to the baseline distribution avoiding the over-parametrization phenomenon, (ii) the related probability functions (cumulative distribution, probability density, hazard rate, and quantile functions) have tractable expressions, and (iii) thanks to the combined action of the arctangent and power functions, the flexible properties of the baseline distribution (symmetry, skewness, kurtosis, etc.) can be really enhanced. These aspects are discussed in detail, with the support of comprehensive numerical and graphical results. Furthermore, important mathematical features of the new family are derived, such as the moments, skewness and kurtosis, two kinds of entropy and order statistics. For the applied side, new models can be created in view of fitting data sets with simple or complex structure. This last point is illustrated by the consideration of the Weibull distribution as baseline, the maximum likelihood method of estimation and two practical data sets wit different skewness properties. The obtained results show that the truncated Cauchy power-G family is very competitive in comparison to other well implanted general families.

20.
Entropy (Basel) ; 22(4)2020 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-33286223

RESUMO

The inverse Rayleigh distribution finds applications in many lifetime studies, but has not enough overall flexibility to model lifetime phenomena where moderately right-skewed or near symmetrical data are observed. This paper proposes a solution by introducing a new two-parameter extension of this distribution through the use of the half-logistic transformation. The first contribution is theoretical: we provide a comprehensive account of its mathematical properties, specifically stochastic ordering results, a general linear representation for the exponentiated probability density function, raw/inverted moments, incomplete moments, skewness, kurtosis, and entropy measures. Evidences show that the related model can accommodate the treatment of lifetime data with different right-skewed features, so far beyond the possibility of the former inverse Rayleigh model. We illustrate this aspect by exploring the statistical inference of the new model. Five classical different methods for the estimation of the model parameters are employed, with a simulation study comparing the numerical behavior of the different estimates. The estimation of entropy measures is also discussed numerically. Finally, two practical data sets are used as application to attest of the usefulness of the new model, with favorable goodness-of-fit results in comparison to three recent extended inverse Rayleigh models.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...